Skip to content

feat: harden model-engine runtime on chainguard#809

Merged
scale-ballen merged 22 commits intomainfrom
codex/model-engine-chainguard-runtime
Apr 20, 2026
Merged

feat: harden model-engine runtime on chainguard#809
scale-ballen merged 22 commits intomainfrom
codex/model-engine-chainguard-runtime

Conversation

@scale-ballen
Copy link
Copy Markdown
Contributor

@scale-ballen scale-ballen commented Apr 16, 2026

Summary

  • migrate model-engine from python:3.13-slim to public Chainguard python:latest-dev / python:latest
  • update the Python dependency set for Python 3.14 compatibility and current security fixes
  • preserve runtime boot by carrying the required git executable and shared libraries into the minimal runtime image

What Changed

  • model-engine/Dockerfile
    • switched builder to cgr.dev/chainguard/python:latest-dev
    • switched runtime to cgr.dev/chainguard/python:latest
    • replaced the old Debian/apt-get flow with apk in the builder stage
    • removed the old Debian runtime package install path and other runtime baggage from the previous image shape
    • build the Python environment in a venv and copy only the needed runtime artifacts forward
    • copy service_configs into the image so the gateway startup path can resolve service_config_circleci.yaml
    • copy git, git-core, and the required runtime libraries (libpcre2-8, libz) from the builder stage into the final image so GitPython-backed startup imports still work
  • model-engine/requirements.in
    • bump ddtrace to >=4.7.1,<5.0
    • bump numpy to >=2.4.4,<2.5
    • bump google-cloud-artifact-registry to ~=1.21.0
    • bump psycopg2-binary to 2.9.11
    • add pytz>=2024.1
    • bump pydantic to 2.12.5
  • model-engine/requirements.txt
    • refresh the transitive set needed for Python 3.14 and the updated base image, including:
      • ddtrace 4.7.1
      • envier 0.6.1
      • numpy 2.4.4
      • grpcio 1.75.1
      • grpcio-status 1.75.1
      • protobuf 6.33.5
      • google-cloud-artifact-registry 1.21.0
      • psycopg2-binary 2.9.11
      • pytz 2025.2
      • pydantic 2.12.5
      • pydantic-core 2.41.5

Why

The previous model-engine image was based on python:3.13-slim and carried a large Debian OS vulnerability surface. Moving to the public Chainguard Python images materially reduces OS exposure, but because public Chainguard currently tracks Python 3.14, the repo also needed a coordinated dependency refresh to restore build and runtime compatibility.

The runtime boot path also imports GitPython-backed ECR helpers during gateway startup. A minimal runtime image therefore still needs the git executable and its required shared libraries present, even though the rest of the image is aggressively minimized.

Validation

Build-time

  • docker build --platform linux/amd64 --progress=plain -f model-engine/Dockerfile -t llm-engine-chainguard-min:local .
    • passed

Runtime

  • direct import smoke:
    • import model_engine_server
    • passed
  • gateway boot with temporary credentials exported from the local production-developer AWS profile and explicit local test env:
    • GIT_TAG=test
    • CIRCLECI=true
    • AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_SESSION_TOKEN
  • health check:
    • GET /healthz returned 200

Security scan

  • trivy image --scanners vuln --severity CRITICAL,HIGH --format json -o /tmp/llm-engine-chainguard-min-final-trivy.json llm-engine-chainguard-min:local
  • final result:
    • 0 CRITICAL
    • 0 HIGH

Notes

  • The runtime still emits warning-level noise during startup, including:
    • no PyTorch / TensorFlow / Flax present in this image
    • a ddtrace warning about Pydantic v1 functionality on Python 3.14
    • a few Python syntax warnings in existing repo code
  • Those warnings did not block gateway startup or the /healthz check.

Greptile Summary

This PR migrates the model-engine runtime from python:3.13-slim to the Chainguard python:latest-dev/python:latest image pair, achieving a 0 CRITICAL / 0 HIGH Trivy result, and coordinates a full Python 3.14-compatible dependency refresh (ddtrace 4.7.1, pydantic 2.12.5, protobuf 6.33.5, grpcio 1.75.1, numpy 2.4.4). It also refactors the remote Docker-build pipeline with cleaner archive filtering, a proper path-normalization helper for build args, and new unit-test coverage for previously untested helpers.

The prior round of review flagged several regressions that are now resolved — kubectl is built from source and copied into the runtime, libcurl/libreadline/libtinfo are collected alongside libz/libpcre2, TARGETARCH is threaded through both Go compilation steps, and the lstrip("./") dotfile-mangling bug is fixed with removeprefix. The BUILD_CONTEXT_TEMP_ROOT being placed inside model-engine/ (so accumulated temp dirs are included in every subsequent build archive) remains unaddressed from the prior round.

Confidence Score: 4/5

Safe to merge once the BUILD_CONTEXT_TEMP_ROOT leakage concern from the prior review is resolved; all other previously flagged regressions (kubectl, libcurl, libreadline, TARGETARCH, lstrip) are now fixed.

All new-round findings are P2 or lower. The one open P1 (BUILD_CONTEXT_TEMP_ROOT inside model-engine/ causing cumulative archive pollution and potential bundle leakage across builds) was raised in the prior review thread and is not yet addressed in this revision, preventing a 5.

model-engine/model_engine_server/infra/services/live_endpoint_builder_service.py (BUILD_CONTEXT_TEMP_ROOT placement), model-engine/Dockerfile (builder stage pulls the entire Kubernetes repo to compile kubectl — very long build times)

Important Files Changed

Filename Overview
model-engine/Dockerfile Major migration from python:3.13-slim to Chainguard Python images; adds git, kubectl (built from source), aws-iam-authenticator, bash, dumb-init, libcurl, libreadline, libtinfo, libz, and libpcre2 to the runtime — addresses all missing-binary/library regressions called out in prior review threads; library copies now use globs to avoid versioned-filename fragility
model-engine/model_engine_server/core/docker/remote_build.py Refactored to extract zip_context helpers (_read_ignore_patterns, _normalize_path_for_archive, _filter_archive_member); fixes the old lstrip("./") bug with removeprefix("./"); nested-archive-root exclusion logic correctly prevents double-inclusion of build-context temp dirs
model-engine/model_engine_server/infra/repositories/ecr_docker_repository.py Added _normalize_build_args to convert absolute substitution paths to relative paths using Path.is_relative_to (Python 3.9+) and explicit base-path equality guard; build_image now passes normalized args to build_remote_block
model-engine/model_engine_server/infra/services/live_endpoint_builder_service.py Added BUILD_CONTEXT_TEMP_ROOT constant (placed inside model-engine/) and _create_build_context_dir helper; the temp-root location inside model-engine/ means the full .build-context tree is included in every subsequent tar archive, allowing previously written bundle .zip files to leak across builds — unresolved from prior review
model-engine/tests/unit/core/docker/test_remote_build.py New unit test file covering _read_ignore_patterns, _normalize_path_for_archive, _filter_archive_member, zip_context, start_build_job, and build_remote; parametrized cases correctly document that simple globs (no /) only match top-level archive entries
model-engine/tests/unit/infra/repositories/test_ecr_docker_repository.py New unit tests for _normalize_build_args (inside/outside/relative/non-string paths) and build_image folder/build-arg assembly; includes explicit test for base-path-itself case
model-engine/model_engine_server/common/dtos/llms/vllm.py Adds cast import; changes stop_token_ids default_factory from list to lambda: cast(List[int], []) to satisfy Pydantic 2.12.5 strict type inference — runtime behavior unchanged
model-engine/model_engine_server/db/migrations/run_database_migration.sh Replaces dirname (external coreutil absent from Chainguard runtime) with ${BASH_SOURCE[0]%/*} pure-bash path stripping; also fixes missing newline at EOF and quotes $DIR expansion
charts/model-engine/templates/cacher_deployment.yaml Readiness probe switched from cat /tmp/readyz to bash -c test -f /tmp/readyz to work with the Chainguard minimal image where cat may not be available
charts/model-engine/templates/endpoint_builder_deployment.yaml Same readiness probe change as cacher_deployment.yaml — cat replaced with bash -c test -f
model-engine/requirements.in Bumps ddtrace, numpy, google-cloud-artifact-registry, psycopg2-binary, pydantic; adds pytz for Python 3.14 compatibility
model-engine/requirements.txt Refreshed transitive dependency lockfile for Python 3.14 / Chainguard compatibility; major version bumps include ddtrace 4.7.1, protobuf 6.33.5, pydantic 2.12.5/pydantic-core 2.41.5, grpcio 1.75.1

Sequence Diagram

sequenceDiagram
    participant EB as EndpointBuilder Pod (Chainguard runtime)
    participant S3 as S3
    participant K8s as Kubernetes API
    participant Kaniko as Kaniko Pod

    EB->>EB: _create_build_context_dir() under WORKSPACE/model-engine/.build-context/
    EB->>EB: _normalize_build_args() rewrite abs paths to relative
    EB->>EB: zip_context() _filter_archive_member() excludes nested roots and ignore patterns
    EB->>S3: upload tar.gz build context
    EB->>K8s: kubectl patch secret codeartifact-pip-conf
    EB->>K8s: kubectl apply -f kaniko-job.yaml
    K8s->>Kaniko: start Kaniko pod
    Kaniko->>S3: download tar.gz context
    Kaniko->>Kaniko: docker build and push to ECR
    EB->>K8s: watch pod status (kubernetes Python client)
    K8s-->>EB: pod Succeeded/Failed
    EB->>K8s: kubectl logs (read final build output)
Loading

Reviews (22): Last reviewed commit: "fix: skip rewriting build context root a..." | Re-trigger Greptile

Comment thread model-engine/Dockerfile
Comment thread model-engine/Dockerfile Outdated
@socket-security
Copy link
Copy Markdown

socket-security bot commented Apr 16, 2026

Comment thread model-engine/Dockerfile
Comment thread model-engine/Dockerfile Outdated
@lilyz-ai lilyz-ai self-requested a review April 17, 2026 22:09
Comment thread model-engine/model_engine_server/core/docker/remote_build.py Outdated
Comment thread model-engine/Dockerfile
Comment thread model-engine/tests/unit/core/docker/test_remote_build.py
Comment thread model-engine/model_engine_server/infra/repositories/ecr_docker_repository.py Outdated
@scale-ballen scale-ballen merged commit 2e9d007 into main Apr 20, 2026
8 checks passed
@scale-ballen scale-ballen deleted the codex/model-engine-chainguard-runtime branch April 20, 2026 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants